On Zero-Error Source Coding with Feedback 



Mayank Bakshi Michelle Effros 
Department of Electrical Engineering 
California Institute of Technology 
Pasadena, CaUfornia 91125, USA 
Email: {mayank, effros} @caltech.edu 



o 



C/3 



> 
in 

o 
o 

> 

• ^ 

X 
S3 



Abstract — We consider the problem of zero error source coding 
with limited feedback when side information is present at the 
receiver. First, we derive an achievable rate region for arbitrary 
joint distributions on the source and the side information. 
When all source pairs of source and side information symbols 
are observable with non-zero probability, we show that this 
characterization gives the entire rate region. Next, we demon- 
strate a class of sources for which asymptotically zero feedback 
suffices to achieve zero-error coding at the rate promised by the 
Slepian-Wolf bound for asymptotically lossless coding. Finally, 
we illustrate these results with the aid of three simple examples. 



I. Introduction 

Traditionally, in most systems that are considered in Net- 
work Source Coding literature, the flow of information is 
assumed to be unidirectional. However, networks that occur 
in practice often comprise of bidirectional links. For example, 
in a network with wireless links, it is possible for any pair of 
nodes to act as a transmitter-receiver pair. In such networks, 
if there is only one source and sink, then assuming a unidi- 
rectional flow of information is not restrictive from a source 
coding point of view as transmission from the sink to the 
source cannot supply the source with any useful information. 
In contrast, if the network consists of multiple sources or sinks, 
then, feedback from sinks to sources has the potential to alter 
the forward rate region UJ. 

Unfortunately, fully characterizing the rate tradeoffs for 
networks with bidirectional links is a non-trivial problem. In 
part, the difficulty arises from the fact that in such networks, 
potentially unbounded number of transmissions may be re- 
quired to achieve optimal rates IS], and usual information 
theoretic techniques do not readily extend to these situations. 

In this paper, we aim to develop insights into these systems 
by studying a simple network. The setup that we consider 
is shown in Figure [T] The source process X is observed at 
Ej. and demanded at Ey . Terminal Ey observes source Y 
jointly distributed with X. We wish to characterize the set of 
rates {Rx,Ry) required to enable Ey to reconstruct X with 
precisely zero probability of error 

A relaxed version of this problem is the asymptotically loss- 
less setting, where the process X is demanded with a vanishing 
error probability as the block length increases without bound. 
For this setting, it is easy to see that even unlimited rate on the 

"This material is based upon work partially supported by DARPA IT- 
MANET and Caltech's Lee Center for Advanced Networking. 



R 



X 



X 



Y 



Fig. 1. Source Coding with feedback 
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Fig. 2. Source Coding without feedback 



feedback link does not change the rate required. In particular, 
for the system shown in Figure [2] Slepian and Wolf |3| have 
shown that the minimum rate required on the forward link 
equals the cutset bound. Since the addition of the feedback 
link does not alter the cutset bound, it follows that the rate 
required cannot be reduced any further. 

In the zero-error setting, the cutset bound is not achievable 
for the system shown in Figure [2] (c.f. |4|,|5 1,|6|,|7|) since 
Ex has to distinguish between all possible pairs of source and 
side information sequences, not just those that are typical. 
We show that a two-way communication between E^ and 
Ey allows them to decide whether or not their observed 
sequences are typical, and enables a tradeoff between the rate 
on the forward and the backward links. Of special interest are 
two extreme cases that are discussed in Examples [T] and |2] 
In the first example, asymptotically zero rate on the feedback 
link enables E^ to operate at rates arbitrarily close to the cutset 
bound, while in the second, the sum rate required on the two 
links is bounded from below by H{X). 

It should be noted that the study of feedback under the 
zero error criterion is not entirely new. Prior works on com- 
munication complexity have examined some aspects of this 
problem (see for example, H, Q, HO], lH]]). In this work, 
we combine insights from both communication complexity 
theory and asymptotically lossless source coding to show that 
feedback is useful in the zero error setting, even for networks 
where it does not help in the asymptotically zero error setting. 

The rest of the paper is organized as follows. The main 



results are presented in Section III Theorem [T] gives an 
achievable rate region for the setup in Fig [T] Theorems |2] 
and [3] give the exact rate region when the joint distribution 
of X and Y satisfy certain conditions. Finally, in Section [TV] 
we examine a few examples to illustrate the main results. 
We begin with defining the notation in Section |ll] 

II. Preliminaries 

Let pxY be a probability mass function on a finite alphabet 
X X y. Denote by px (resp. py), the marginal of pxv on 
X (resp. y). For each n E N, the collection of random 
variable {Xi, Yi), (X2, 1^2), ■ ■ • , {Xn, Yn) is drawn i.i.d. from 
the distribution pxY- Let {0, 1}* denote the set of all finite 
length sequences drawn from {0, 1}. Let Z:{0,1}*— ^Nbe 
the length function on {0, 1}*, i.e. if & is a string of n bits, 
then l{b) = n. 

Define an n-dimensional, /c-interactive code (/, g) to be a 
collection of 2A; — 1 functions 

/, : X" X T {0, 1}* for i = l,2,...,k, 
and : X" X y" ^ {0, 1}* for i 1, 2, . . . , A: - 1 

that satisfy the property that for each (x, y) e X" x 
T, /i(x,y) = 7i(x), and for z = l,2,...,n - 1, 
/i+i(x,y) = /i+i(x,gi(x,y),...,5i(x,y)) and 5i(x,y) = 
gi{y, /i(x, y), . . . , /i(x, y)) for some collection of functions 
{{fii9i) ■ "i = — 1}. We call blocklength-?^, k- 

interactive code (/, g) a zero-error code if there exists a 
decoder function h : ({0, l}*)*^ x ^ X" such that for all 
(x, y) e X" X T, /^(/i(x, y), /2(x, y), . . . , /..(x, y), y) = x. 
We allow (/, g) to be a variable-length code. The average rate 
for the code thus defined is the pair {Rx, Ry), where 

k 

1=1 

k-l 

and i?y = -VE[/(g,(X",y")). 
n ^ — ' 

2 = 1 

We say that a rate point {Rx,Ry) is zero-error achievable 
if, for some integers 71 and k, there exists an block length- 
n, fc-interactive zero-error code with average rate {Rx,Ry)- 
The rate region HziX^Y) is the closure of the set of all 
zero-error achievable rates. Let Ae"^(X) denote the set of e- 
strongly typical sequences in X". Similarly, define Ai"'\Y) 
and Alr\x,Y). The type class of probability mass function 
Q is denoted by T^''\Q) (see |12| for details). 

III. Results 

We derive an achievable region in Theorem [T| Towards this 
end, we first present a weaker version of the theorem in the 
following Lemma. 

Lemma 1: 3iziX,Y) D {{Rx , Ry) ■ Rx > H{X\Y), 
Rx+Ry> H{X)} 

Proof: Let R > H{X\Y). Consider the following code 
construction. 



Fix a block length n > 1. Partition Ai"\x) into 2"^ bins 
{■B, : i = 1,2, . . . ,2"-^} by assigning each x e A^^(X) 
a bin chosen uniformly at random. Let B : Ai'^\x) 
{1, 2, . . . , 2"^} denote the mapping from sequences in X" 
to the corresponding bin number For i = 1,2,..., 2"^, let 

: Si — > {1,2,..., |55i|} be a numbering of sequences in 
the i-th bin. 

Let X and y be the sequences observed by Ex and 
Ey respectively. Consider the block length-n, 2-interactive 
code that defines the following protocol: 

1. Ex sends /i(x,y), where 

0-x ifx(^A'-r\x) 



/i(x,y) = 



1 • B{x.) otherwise. 



2. If /i(x, y) = • X, then the procedure stops. Else, 

if y^ ^("'(r) or 



a. Ey sends 



5i(x,y) = 



where. 



b. E r sends 







1 • /^(S 



{Si,y)iAi"\x,Y) 

V X e 'Bb(x) 

otherwise 



/2(x,y) = 



arg min /b(x)(x). 



0-X if 5ri(x,y) = or 
5i(x,y) ^ and 

1 otherwise. 



Since the mapping from x to the pair (_B(x), /■^(''^(x)) 
is one-to-one, the above protocol ends with Ey decoding x 
correctly for each x G X". Let P„ denote the probability that 
the sequence of transmissions is 

l-S(x); l-/''(x); 1 

(n) (n) 

Let R^^ ' and Ry denote the expected rates on the forward 
link and the backward Unk respectively. These can be bounded 
from above as 



R 



^x^ < PniR + 2/n) + il~Pn)iHiX) + 2/n) 
and < P„((l/n)E[log |Sb(X")|]) + (1 - Pn)il/n) 

= P„((ff(X)-i? + e) + (l-P„)(l/n). 

Following previous results on random binning (c.f.f3l), it is 
easily seen that P„ approaches 1 as n grows without bound. 
Thus, 



lim sup R 



(n) 
X 

?(") 



< R 



and lim sup i?^'^ < H{X)-R + e. 

n— >-oo 

Since e is arbitrary, this proves the desired result. ■ 
Next, using previous results on zero-error coding without 
feedback |4| (See Fig|2]i, we improve the above rate region. Let 
Hz{X\Y) denote the minimum rate required for describing X 



without error when Y is known at the decoder. The following 
theorem gives an achievable region when a feedback link is 
present. 

Theorem!: 5iziX,Y) D {{Rx,Ry) ■ Rx > H{X\Y), 
Rx + Ry > Hz{X\Y)}. 

Proof: Let R > Hz{X\Y). By the result of H, for some 
block length n, there exists a function c : X" — >^ 6 satisfying 
the following properties: 

1) Let x,x' e X" such that there exists y G y" for which 
Pxy(x,y) > and pxy(x',y) > 0. Then, c(x) ^ 
c(x'). 

2) H{c{X")) = nR. 

Observe that knowing c(x) is sufficient for £j, to decode x 
with zero error Therefore, JiziX,Y) D i3?z(c(X"), F"). 
Further, i?(c(X")|y")) > nH{X\Y) from Slepian and 
Wolf's resuh |3|. 

Using Lemma [T] 3?z(c(X"), F") D {{Rx,Ry) : Rx > 
H{c{X"-)\Y''),Rx + Ry > -ff(c(X"))}. It follows that 
Olz{X,Y) D {{Rx,Ry) ■■ Rx > H{X\Y),Rx + Ry > 
H{X)). m 

Theorem [2] shows that Theorem [T| is tight for all Pxy such 
that Pxy > for all {x, y) G X x 

Theorem 2: Let X G X and F G ^ be random variables 
such that pxY{x,y) > for all {x,y) G X x y. Then, 
Jlz{X,Y) = {{Rx,Ry) ■■ Rx > H{X\Y),Rx + Ry > 
H{X)} 

Proof: The achievability of the given rates follow from 
Theorem [T] For the converse, we use an argument inspired 
by communication complexity theory (c.f. |l9l). Let {f,g) be 
an n-dimensional interactive code over fc-sessions operating at 
the rate {Rx,Ry)- Define the set G{f,g) to be the set of all 
possible codewords, i.e., 

e(/,5) " {(/,5)(x,y):(a;,2/)GX"xy"}, 

and let 'D{f,g) be a partition of X" into sets that are inverse 
images of singletons in G{f,g), i.e., 

2)(/,.g) = {{(x,y) G X"xT : (/,g)(x,y) = c} : c G e{f,g)} 

Notice that our definition of an interactive code implies that 
if for some D G D{f,g), (xi,yi) G D and (x2,y2) G D, 
then (xi,y2) G D and (x2,yi) E D as well. Further, at the 
end of the transmission, X" has be decoded at £j, without 
error Thus, whenever (xi,y) G D and (x2,y) G D for some 
D G 'D{f,g), then Xi = X2. Therefore, every D G D{f,g) is 
of the form {x} x Z?x for some x G X" and C y". For 
X G X", let 

= {De V{f,g) : D = {^}xD^ for some C T} 

For the code (/, g), let R^^^ and Ry^^ denote the expected rates 
on the forward link and the backward link respectively. Since 
G{f,g) is a uniquely decodable code over the input alphabet 
D{f,g), from the converse to the source coding theorem it 
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(a) A cycle-free p.m.f. (b) A a p.m.f. with a cycle 
Fig. 3. Example of G{Pxy) for two different distributions 



follows that 
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xeX" DG'D^(/,g) 
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by the concavity of entropy. Finally, note that even if Y is 
known at S.^ , for successful decoding, we require the forward 
rate to be at least as large as H{X\Y). Thus, Rf > H{X\Y). 

■ 

For a probability mass function Pxy on X x y, define 
G{Pxy), the connectivity graph of X and Y, as a graph with 
vertices XU y and edges {{x^y) : pxY{x,y) > 0}. We say 
that Pxy is cycle-free if G{Pxy) has no cycles. See Fig [s] 
for an example of such a probability mass function. We next 
show that the rate {Rx,Ry) = (^^(-'^l^), 0) is in the zero 
error region 3?z,(X, F) when Pxy is cycle-free. 

Theorem 3: Let X and Y be random variables drawn from 
a joint distribution Pxy that is cycle-free. Then, "RziX, Y) — 
{{Rx.Ry)-Rx>H{X\Y),Ry>Q}. 

Proof: The converse follows immediately from the 
Slepian- Wolf problem since the rate required on the forward 
link for zero-error coding is no less than the rate required for 
the asymptotically lossless Slepian- Wolf code. We now show 
the achievability of the claimed rates. 

As in the proof of Theorem [T] let i? > H{X\Y) and 
partition A^r\x) into 2"-" bins {S, : i = 1, 2, . . . , 2"-"} 
by assigning each x G A^J^\x) a bin chosen uniformly at 
random. Let B : A^r\x) {1, 2, . . . , 2"-«} denote the 
corresponding mapping from sequences in A^J^\x) to bin 
numbers. Let x and y be rt-length sequences observed at 
£-c and £y respectively. Denote the empirically observed type 
class of X by T^"' (x). Note that knowing T^"' (x) is sufficient 
for £y to determine if (x,y) G 

Consider the following protocol. 

1. £, sends /i(x, y) = f (")(x). 



2. £j, sends 



5i(x,y) = 
3. Ex sends 

/2(x,y) = 



1 if (x,y)eyl(")(x,r) 

otherwise. 



S(x) if.gi(x,y) = l 
X otherwise 



4. If there is a unique x' G 23b(x) such that (x',y) G 
Ai"'\x,Y), or if gi(x, y) = 0, transmission stops. 
Otherwise, Ey sends g2(x,y) = 0. 

5. Ex sends /3(x,y) = x 

From Lemma |2] it follows that given individual type classes 
of X and y, the joint type class is uniquely determined. 
Therefore, the above protocol always outputs the correct value 
X. Finally using the same argument as in Theorem [T] as long 
as i? > H{X\Y), the expected rate on the forward link for the 
above code approaches R as n grows without bound, while 
the rate on the backward link approaches 0. 

■ 

IV. Discussion 

We have shown that for every pair {X, Y) such that the 
rate H{X\Y) on the forward link is not achievable without 
feedback, the addition of the feedback link enables us to lower 
the forward transmission rate. In particular, for certain classes 
of sources. Theorem [3] shows that even asymptotically zero 
feedback is useful. The following example illustrates this. 

Example 1 (Binary Erasure Channel): Let X be 
distributed uniformly on X = {0, 1} and let Y distributed on 
y = {0, 1} with the transition probability 



Py\x{v\x) 



I — p if y ~ X 
p if y = e. 



From prior results (c.f. [4|), it follows that without feedback, 
the minimum rate for zero error coding of X is H(X) = 1. On 
the other hand, Theorem[3]shows that even with asymptotically 
zero feedback, a rate of H{X\Y) = p is achievable on the 
forward link. 

An interesting contrast to the above example is provided by 
the following example. 

Example 2 (Binary Symmetric Channel): Let X be dis- 
tributed uniformly on X = {0, 1} and let Y be distributed 
on y = {0, 1} with the following transition probability 



PY\x{y\x) 



1 — p if y = X 

P ify^x. 



The minimum rate possible without feedback for this example 
is the same as that in Example [T] However, the presence of 
asymptotically zero feedback does not reduce the minimum 
rate required on the forward link. However, Theorem [2] enables 
using non-zero rate on the feedback link to operate at lower 
rates on the forward link. In particular, 'JlziX, Y) is given by 
{{Rx.Ry) ■■ Rx > Hip), Rx + Ry> 1}. 

Finally, note that it is not Pxy being cycle free is not a 
necessary condition for Theorem |3] to hold. This is shown in 



the following example. 

Example 3 (Binary Erasure Channel with Two Erasures): 
Let X be distributed uniformly on X = {0, 1} and let 
Y disti'ibuted on ^ = {0, £'i, i?2, 1} with the ti-ansition 
probability 



PY\x{y\x) 



I — p if y = X 
p/2 if y = ei or 62. 

Lemma [3] which is proved in the Appendix shows that this 
example, is in fact, equivalent to Example [T] Thus, a rate 
H{X\Y) — p on the forward link can be achieved with 
asymptotically zero rate on the feedback link, even though 
Pxy is not cycle free. 

Appendix 

Lemma 2: Let r("H<3x) Q X", T(")(Qy) C T, and 
T''"'HQxy) C X" X y" be type classes that ai-e consistent 
with each other, i.e., the marginal of Qxy on X (resp. y) is 
Qx (resp. Qy)- Further, assume that Qxy is cycle-free. 

Under the above conditions, if x G T'^"'\Qx), y & 
T^'^HQy), and Qx,Y{^..y) > 0, then (x,y) G T^-^HQxy)- 

Proof: Let N{a,x) denote the number of occurrences 
of a symbol a G X in the sequence x. Likewise, define 
A^((a, 6), (x, y)) to be the number of simultaneous occur- 
rences of the pair {a,b) in the sequence (x, y). To prove 
the lemma, we apply induction on the size of X x y. The 
smallest non-trivial case corresponds to |Xuy| = 3, i.e., either 
|X| = 2 and |y| = 1 or |X| = 1 and |V| = 2. In the first case, 
iV((o,6),(x,y)) = iV(a,x) for all (a, 6) G X x Thus, 
X G T(")(Qx) implies that (x,y) G T^'^^Qxy)- A similar 
argument holds for the second case. 

Assume that the lemma is true whenever |X U y| < K. 
Suppose now that for Qxy, |Xuy| — K. Notice that if Qxy 
has no cycles, then the connectivity graph G{Qxy) on XU^ 
has at least one vertex with exactly one edge connected to 
it. To see this, pick any vertex vi in X U ^ and construct a 
sequence of vertices wi,W2,-- - such that [vi^Vi+i) are pairs 
of connected vertices and Vi 7^ for each i > 1. Since 
X U y is a finite set, either Vj — vi for some j > 1 or 
the sequence terminates at a vertex Vk which has exactly one 
edge connected to it. Since Qxy is cycle-free, it follows that 
the second condition must be true. Further, the vertex Vk also 
satisfies the property that the transition probability from v to 
its neighbour is 1 under Qxy- 

Fix any x G T("'(Qx) and y G T^'^^Qy) such that 
Qj)s:y(x,y) > 0. Let u be a vertex in G{Qxy) that has 
exactly one connected edge. The following argument shows 
that x,y G r(")(Qjfy). Since the argument is symmetrical 
in X and Y, without loss of generality, assume that v G X 
and w G y be the vertex connected to v in G{Qxy)- 
Since x G T^"HQx), iV(i;,x) = nQx{v) and therefore, 
N{{v,w), (x,y)) = nQx{v) = nQx.Y{v,w). 

Now, let X' = X \ {v},^' = y, and Q'y ^ Qy- Define 
probability mass functions Q'^ on X' and Q'xy ^ ^' 



follows: 

Q'xix) ^ Qx{x)/{1-Qx{v))y xeX' and 
Q'xY{x,y) = QxY{x,y)/{l- Qxy{v,w)) 

V {x, y) e r X 

Let (x',y') be the subsequence of (x,y) of length n' = n — 
N{v,x.) obtained by deleting the indices that correspond to 
occurrences of {v,w) in (x,y). It can be verified that x' e 

Tx'HQ'x) and y' € T^"'^*?^)- Since, Q'^y is cycle-free 
and |X'uy'| = — 1, by the induction hypothesis, (x',y') e 
T^'^'HQ'xy)- Hence, V(x,y) e X x ^ \ {{v,w)}, 

7V((a;,y),(x,y)) = 7V((a;,y), (x',y')) 
= n'Q'xY{x,y) 

= (n - nQxY{v, w))Qxy{x, y) x 
II{1-Qxy{v,w)) 

= nQxY{x,v). 

This shows that (x, y) G T^") (Qxf). ■ 
Lemma 3: Let / : ^ Z be such that H{X\Y) = 
H{X\f{Y)). Then, 3?z(^,>") = 3?z(X, /(F)). 

Prao/- Clearly, :Rz{X,Y) 2 :kz{X,f{Y)) since £y can 
compute f{Y) and hence hence, operate at all rate points in 
3?2,(X, ,f{Y)). To see the reverse inclusion, define a ^-valued 
random variable Y' satisfying the Markov chain Y— f(Y) — Y' 
andletpY/|/(Y)(y|i;) = pY\f(Y){y\z) for all {y,z) €^ xZ.lt 
follows that H{X\Y) = hIx\Y'), py = Py', and therefore, 
Px\Y{x\y) = Px\Y'{x\y). Hence, the joint distribution of 
X and Y is same as that of X and Y', which impUes 
that 5lz{X,Y) = 5lziX,Y'). Finally, note that given f(Y), 
Ey can generate Y' randomly. Therefore, 3iz{X,f{Y)) D 

:Rz{x,y') = :Rz{x,y). 

m 
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